Approximating and testing k-histogram distributions in sub-linear time Citation
نویسندگان
چکیده
A discrete distribution p, over [n], is a k-histogram if its probability distribution function can be represented as a piece-wise constant function with k pieces. Such a function is represented by a list of k intervals and k corresponding values. We consider the following problem: given a collection of samples from a distribution p, find a k-histogram that (approximately) minimizes the `2 distance to the distribution p. We give time and sample efficient algorithms for this problem. We further provide algorithms that distinguish distributions that have the property of being a k-histogram from distributions that are -far from any k-histogram in the `1 distance and `2 distance respectively.
منابع مشابه
Learning k-modal distributions via testing Citation
A k-modal probability distribution over the domain {1, ..., n} is one whose histogram has at most k “peaks” and “valleys.” Such distributions are natural generalizations of monotone (k = 0) and unimodal (k = 1) probability distributions, which have been intensively studied in probability theory and statistics. In this paper we consider the problem of learning an unknown k-modal distribution. Th...
متن کاملApproximating the Distributions of Singular Quadratic Expressions and their Ratios
Noncentral indefinite quadratic expressions in possibly non- singular normal vectors are represented in terms of the difference of two positive definite quadratic forms and an independently distributed linear combination of standard normal random variables. This result also ap- plies to quadratic forms in singular normal vectors for which no general representation is currently available. The ...
متن کاملHistogram analysis- a useful tool for tissue characterization in brain CT
Introduction: Pixel value in computed tomography (CT) gives the average linear attenuation coefficient of the scanned material in the path of the x-ray beam, being normalized to that of water. It is known that attenuation coefficient or HU value is a function of the chemical characteristic of the material and of the x-ray energy. The CT image shows the HU value by a shade of gr...
متن کاملUnderstanding and Improving Residual Distributions for Linear Poisson Models
The method of Linear Poisson Models (LPMs) is able to construct approximating linear models of histogram data behaviour based upon the assumption of independant Poisson noise. Crucially the method has been developed with associated techniques for the assessment of the effects of noise on both model construction and subsequent use of these models in quantitative data analysis. As a consequence i...
متن کاملNear-Optimal Closeness Testing of Discrete Histogram Distributions
We investigate the problem of testing the equivalence between two discrete histograms. A k-histogram over [n] is a probability distribution that is piecewise constant over some set of k intervals over [n]. Histograms have been extensively studied in computer science and statistics. Given a set of samples from two k-histogram distributions p, q over [n], we want to distinguish (with high probabi...
متن کامل